Unsupervised EEG preictal interval identification in patients with drug-resistant epilepsy

Typical seizure prediction models aim at discriminating interictal brain activity from pre-seizure electrographic patterns. Given the lack of a preictal clinical definition, a fixed interval is widely used to develop these models. Recent studies reporting preictal interval selection among a range of fixed intervals show inter- and intra-patient preictal interval variability, possibly reflecting the heterogeneity of the seizure generation process. Obtaining accurate labels of the preictal interval can be used to train supervised prediction models and, hence, avoid setting a fixed preictal interval for all seizures within the same patient. Unsupervised learning methods hold great promise for exploring preictal alterations on a seizure-specific scale. Multivariate and univariate linear and nonlinear features were extracted from scalp electroencephalography (EEG) signals collected from 41 patients with drug-resistant epilepsy undergoing presurgical monitoring. Nonlinear dimensionality reduction was performed for each group of features and each of the 226 seizures. We applied different clustering methods in searching for preictal clusters located until 2 h before the seizure onset. We identified preictal patterns in 90% of patients and 51% of the visually inspected seizures. The preictal clusters manifested a seizure-specific profile with varying duration (22.9 ± 21.0 min) and starting time before seizure onset (47.6 ± 27.3 min). Searching for preictal patterns on the EEG trace using unsupervised methods showed that it is possible to identify seizure-specific preictal signatures for some patients and some seizures within the same patient.

1 Study assumptions on the analysed interval of data To minimise the influence of the postictal state on the EEG trace, we considered a postictal interval of 30 minutes. As such, when two seizures were separated by exactly 4.5 to 5 hours, we discarded data corresponding to a postictal interval of 30 minutes to 1 minute, respectively [1][2][3] .
The duration of the SPH, set to 10 minutes in this study [4][5][6] , was defined according to the future clinical application. Since we are analysing scalp EEG data, the treatment strategies were limited to (i) acute drug administration to prevent an imminent seizure or (ii) the patient taking action to avoid accidents resulting from seizure occurrence. Rescue medication takes effect at most five to ten minutes after administration (as is the case of diazepam rectal gel) [7][8][9] . Treatment strategies requiring intracranial device implantation to deliver brain electrical stimulation typically define SPH intervals of a few seconds. However, given the considerable differences between scalp and invasive EEG, neurostimulation was not considered in our range of future applications 10,11 . We then defined a longer SPH interval of 10 minutes so that, when envisioning future studies, seizure prediction models enable the patient to have enough time, after receiving the alarm, to prepare for an upcoming seizure 11 .
The following assumptions were considered when analysing the 4.5 hours of data recorded before an electrographic seizure onset: (i) Seizures separated by at least 4.5 hours were considered independent events.
(ii) Although 4.5 hours (270 minutes) of data were analysed, only the electrographic changes observed within the 120to 10-minute interval before the seizure onset were assumed to mainly represent the preictal state. Data locating before the 120-minute interval were considered as enough data to predominantly contain interictal data (see Fig. S1).
(iii) Given the higher probability of a preictal interval lasting less than an interictal interval, the cluster with the smaller number of samples represented the preictal interval (see Fig. S1).
(iv) This interval may not occur strictly near the seizure onset but could be captured as an EEG-related event eventually preceding the seizure onset (see Example 2 in Fig. S1).
2 Patient and seizure metadata Table S1 contains information regarding the group of patients with temporal lobe drug-resistant epilepsy analysed in this study. Data were recorded while the patients were under presurgical monitoring, being submitted to medication withdrawal as the activation procedure 12 . The table includes information on sex, age at hospital admission and onset age (corresponding to the occurrence of the first epilepsy event), aetiology, epilepsy foci lateralisation, the total number of annotated seizures and the number of seizures analysed for each patient (lead seizures), according to the considered minimum inter-seizure interval of 4.5 hours. Information on aetiology is presented according to the International League Against Epilepsy (ILAE) nomenclature 13,14 . Most data were collected from patients with hippocampal sclerosis; even tough epilepsies with other structural causes, such as tumours and malformations of cortical development, have also been identified. Only one patient has immune aetiology, specifically, inflammation of the autoimmune-mediated central nervous system. Table S2 contains a description of metadata annotated for each analysed seizure. This information, available in the EPILEPSIAE database 12,15 , was annotated by experienced professionals by inspecting both video and EEG data. The table then includes information about seizure type, vigilance state determined 10 seconds before the seizure onset, and EEG onset time over the 24h period. Seizures were classified according to the ILAE nomenclature 16 . The vigilance state corresponds to one of the following states of alertness and responsiveness: wakefulness, non-rapid eye movement sleep (NREM sleep, further subdivided into three sleep stages N1-3) and rapid eye movement (REM) sleep 17 . Information about the seizure type, vigilance state and EEG onset time is also depicted in Fig. S2.
Additionally, Table S2 comprises information regarding the percentage of time during which noisy segments have been identified in each seizure's 4.5 hours of EEG. These noisy segments do not contain neurological information, but rather flat lines, saturated signal or abnormal peaks, for instance, caused by electrode detachment.
Given the extension of this table, we decided to provide here the information regarding the final preictal intervals identified for each seizure (either category 3 or 6), which was later used to perform preictal interval comparison among studies (refer to sections 6.5 and 6.6).
It is important to note that when preictal patterns were observed for more than one group of features, a final interval had to be chosen. The final preictal interval was chosen according to its characteristics. The first criterion was based on the preictal density as a preictal interval with higher density means that the vast majority (or even all) of the samples belong to the preictal state and there are fewer "jumps" to the remaining clusters. A preictal interval associated with a higher density means that a more evident and permanent change occurs before the seizure onset 18 . The second criterion, used when two preictal intervals had the same density, consisted in choosing a preictal interval based on the duration. In other words, for preictal intervals with different duration, we chose the one with the highest duration as it provided more statistical confidence in the presence of a preictal state 18 . Lastly, when after these two criteria, we still had two preictal intervals with the same density and duration; a third criterion was used to select the final interval. Namely, the third criterion consisted of choosing preictal intervals starting near the seizure onset. This means that the patient has to wait less time for the seizure to occur, reducing the impact of larger waiting times on the patient's anxiety levels.

EEG feature engineering
There is a vast amount of literature spanning the different groups of features typically extracted from EEG. The next subsections describe the univariate linear, univariate nonlinear and multivariate features considered in this study according to the state of the art 19,20 . The frequency bands considered in univariate linear and multivariate feature extraction comprise delta (0.5 to <4 Hz), theta (4 to <8 Hz), alpha (8 to <13 Hz), beta (13 to <30 Hz), and gamma (30 to <47 Hz) [19][20][21] .
The upper limit of the gamma band was defined to mitigate the impact of muscle artefacts in the analysis. Contrarily to intracranial EEG, with scalp EEG, it is more difficult to accurately capture lower amplitude faster oscillations characteristic of beta and gamma bands. Additionally, the large number of muscle artefacts in scalp EEG overshadows the real quantification of gamma rhythms due to the overlapping frequency spectrum 22,23 . During the EEG preprocessing step, there were 10-minute segments for which brain information was maintained at the cost of not removing some of the artefacts (mainly muscle artefacts). As such, spectral information above 47 Hz was left out of the analysis to reduce the impact of muscle artefacts in the results.

Univariate linear features
A total of 42 features were extracted from each 5-second window, and each EEG channel 19-21 .

Statistical measures
Five statistical measures were considered in this study: normalised and non-normalised (with respect to the maximum value in each window) mean amplitude 5 , standard deviation, skewness, and kurtosis 19,24 . These measures are meant to capture information regarding the amplitude distribution of a given time series. Symmetric and asymmetric amplitude distributions translate to zero and non-zero skewness, respectively. The relative flatness (or peakedness) of the amplitude distribution is reflected in the value of kurtosis 20 .

Hjorth parameters
Hjorth parameters, activity, mobility and complexity, were conceptualised as clinically useful tools to quantitatively describe the EEG 20 . Activity is given by the variance of a given time series, y(t): Mobility corresponds to the variance of the slopes of a time series normalised by the variance of that time series: Complexity quantifies the variance of the rate of slope changes of a time series with reference to an ideal sine curve. The more similar the time series is to a pure sine wave, the more approximate will be the value of complexity to 1.

Decorrelation time
The decorrelation time corresponds to the time at which the first zero-crossing of the autocorrelation function occurs. The samples in a time series are less correlated as the time of the first zero-crossing approaches zero. The decorrelation time is, therefore, an indicator of signal periodicity. Considering the extreme case of a white noise signal, it theoretically presents a zero value of decorrelation time 25, 26 .
■ Spectral power in each frequency band (delta, theta, alpha, beta, and gamma).
■ Relative spectral power in each frequency band (delta, theta, alpha, beta, and gamma).

Spectral edge frequency and power
Typically, the power spectrum of an EEG signal is characterised by a predominance of power in the 0 to 40 Hz frequency band. The spectral edge frequency, f 50 , is a measure of the power spectrum distribution that consists of the minimum frequency for which it is possible to obtain 50% of the spectral power up to 40 Hz, P 40 Hz 20 .
The spectral edge power corresponds to the power spectrum area below the spectral edge frequency 25,26,28 .

Energy of wavelet coefficients
The wavelet transform has been used as an alternative to the FFT analysis as it allows for multiresolution time-frequency decomposition. Particularly, the discrete wavelet transform (DWT) decomposes a given time series into approximation and detail coefficients yielding the first level of decomposition. The approximation coefficients in every level are further decomposed into the next level of approximation and detail coefficients. The DWT coefficients are obtained by applying the mother wavelet to a given time series at different translations and scales. The first levels correspond to the time series' high-frequency content, whereas the last levels contain low frequencies 25,[29][30][31] . Given that the dataset under analysis contains data sampled at 256 Hz, we applied the DWT decomposition using Daubechies (db4) 25

Univariate nonlinear features
A total of 29 univariate nonlinear features were extracted from each 5-second window and EEG channel:

Higuchi's fractal dimension
Higuchi's fractal dimension (HFD) is a fractal measure of the irregularity and self-similarity of a given signal 31,33,34 . HFD corresponds to the slope of the linear fit between a log-log plot of the length and different scales of a given 5-second EEG window. This feature requires the definition of a free parameter (k max ) corresponding to the maximum number of scales that have been analysed. In our study, this parameter was set to 100 as a result of an estimation process that evaluated a range of k max values and assessed when the corresponding values of fractal dimension reached a plateau [33][34][35] .

Monofractal detrended fluctuation analysis
Monofractal detrended fluctuation analysis (DFA) was performed to explore the extent of long-range correlations in the EEG 5-second windows for different time scales 36,37 . Two scaling exponents were returned from DFA analysis: (i) DFA slope α 1 (DFA α 1 ) corresponding to short-term fluctuations, within the 10-32 sample range and (ii) DFA slope α 2 (DFA α 2 ) corresponding to long-term fluctuations, over the 32-128 sample range 36 .

Multifractal detrended fluctuation analysis
Multifractal detrended fluctuation analysis (MFDFA) is a generalisation of the DFA method, which is computed for a single scale or fractal dimension. This method explores the possibility that different fractal patterns may describe the fractal structure of the EEG segments. Three measures were obtained from the multifractal spectrum: width, the abscissa value of the apex, and the asymmetry parameter 36,38,39 . The asymmetry parameter measures the multifractal spectrum symmetry: a symmetric spectrum corresponds to a zero value of the asymmetry parameter; an asymmetric spectrum that is left-or right-skewed yields a positive or negative value of the asymmetry parameter 40 .

Multifractal 1-D Wavelet Leader estimates
Multifractal 1-D Wavelet Leader estimates is an alternative method based on wavelet analysis to estimate the multifractal spectrum 41 . We extracted 11 measures characterising the multifractal spectrum, [41][42][43][44] as can be seen in Fig. S3 (e)-(f). An asymmetrical spectrum is obtained when the structure of the time series is not sensitive to the local fluctuations with (i) large magnitudes (long right tail) or (ii) small magnitudes (left long tail). When the time series contains high and low fluctuation components presenting a similar scaling complexity, the multifractal spectrum takes a symmetrical shape 36,43 .

Approximate and sample entropies
Approximate and sample entropies quantify the regularity and the complexity of a time series [45][46][47] . By inspecting these features, it is possible to know the likelihood that similar sequences found for m points, within a tolerance, r, will also be found for m + 1 points. Both measures require the definition of the number of points, m, comprised in the sequences further compared, and the tolerance value, r, for which matches are accepted. Parameter m is equal to 2 46 . The tolerance, taken as the similarity criterion, was set to 0.2 × SD, SD corresponding to the standard deviation of the 5-second EEG window (of length N) 46, 48-50 .

Correlation dimension and largest Lyapunov exponent
The correlation dimension (CD) and the largest Lyapunov exponent (LLE) were obtained from the reconstruction of the underlying m-dimensional dynamical system. The former quantifies the complexity of a system, whereas the latter provides information regarding the overall predictability of that system, i.e., the evolution of the trajectories in the phase space. Periodic signals are associated with null values of LLE, whereas chaotic systems display increased values of the LLE 51 . Similarly, an increase in complexity corresponds to an increase in the value of CD 51 . These measures were obtained through the reconstruction of the underlying system's phase space. In this study, the two parameters required for phase space reconstruction, the embedding dimension, m, and the time delay, τ, were estimated (for each 5-second window) using the False Nearest Neighbour algorithm and the first local minimum of the average mutual information method, respectively (refer to Fig. S4) [52][53][54] . The LLE was then obtained using the Rosenstein et al. (1993) 55 method, which is one of the most widely used for this purpose, whereas CD was computed based on the Grassberger & Procaccia (1983) 56 method.

Recurrence quantification analysis
A recurrence quantification analysis (RQA) provides information regarding hidden periodicities in the aforementioned phase space trajectory. Such information can be accessed by computing the recurrence plot (RP) of a given time series 57 . The first step to obtaining such representation is to compute the square matrix with dimensions N ×N (known as the colour recurrence plot), containing the pairwise Euclidean distance (given by the norm ||·||) between all samples of the trajectory, N, in the m-dimensional space. Afterwards, a threshold distance ε is used to define a sphere centred at the state x i . If x j falls within that sphere, then the Heaviside function Θ(·) decides for R i, j = 1, meaning the states are close to each other. Otherwise, R i, j = 0. This binary matrix, symmetric along the identity line, can therefore be visualised in a black (R i, j = 1) and white plot (R i, j = 0), the RP. In this study, the value of ε was estimated for each 5-second non-overlapping window, corresponding to 10% of the maximum phase space diameter 58

Graph indexes for undirect connectivity measures
With the exception of circular omega complexity (which is already a multivariate measure of connectivity), all the remaining bivariate features were analysed using the following graph measures. These are measures of local and global connectivity 62, 64, 67, 68 : ■ Assortativity (A).

Graph indexes for direct connectivity measures
The following graphs measures were extracted from the connectivity matrix of the phase slope index bivariate feature 67 : ■ Assortativity (A).

Feature significance in seizure prediction studies
Most of the univariate linear features described above have been widely used in the context of seizure prediction. For instance, statistical measures have been shown to significantly change during the preictal period compared to the interictal state 19,24,27,28 . The preictal interval has been associated with a decrease in the variance, and an increase in the kurtosis 19 . Hjorth parameters (mobility and complexity) were also reported to increase during the preictal interval 24 . The decorrelation time was documented to decrease near the seizure onset 19 . Pinto et al. concluded that Hjorth mobility and skewness were highly discriminating features, often selected by their evolutionary seizure prediction model 6 . FP1  FP2  F3  F4  C3  C4  P3  P4  O1  O2  F7  F8  T7  T8  P7  P8  FZ  CZ  PZ   FP1  FP2  F3  F4  C3  C4  P3  P4  O1  O2  F7  F8  T7  T8  P7  P8  FZ  Extracting frequency-domain univariate linear features has been extensively performed in seizure prediction 19 . Mormann et al. 24 showed a decrease in delta band power during the preictal period, accompanied by a relative increase of power in the other subbands. Park et al. documented gamma frequency bands to be the most discriminating features when classifying interictal and preictal samples 69 . Pinto et al. reported theta band relative power and mean normalised frequency as the most frequently extracted features by their seizure prediction model. Spectral edge frequency at 75% of the spectrum was later reported by the same authors to arise as a highly discriminating feature in seizure prediction 6 .

ISPC theta
Additionally, univariate nonlinear features have been investigated in seizure prediction. The largest Lyapunov exponent, correlation dimension, fractal dimension, recurrence quantification analysis, and approximate and sample entropies are frequently considered to develop prediction models 19,20,31,47,70,71 . Mormann et al. 24 observed an increase of the largest Lyapunov exponent 30 minutes before seizure onset. In this study, we extracted different measures of fractal dimension. Higuchi's fractal dimension has already been used in the context of seizure prediction 31,35 . The detrended fluctuation analysis (DFA) has been mainly used to measure the long-range correlation of the electrocardiogram 72 . In epilepsy, DFA has often been considered to identify the disease characteristics 73,74 and for the noninvasive localisation of the epileptic zone before presurgical evaluation 75 . More recently, some attempts have been documented on the use of DFA in seizure prediction studies 39,76 . Initially, the monofractal perspective on the EEG oscillations was typically addressed by a monofractal DFA analysis. However, the nonstationary nature of EEG demands a multifractal DFA that contemplates the existence of large and small fluctuations 40,73 . Besides adding a multifractal DFA, we also considered the method of multifractal 1-D Wavelet Leader estimates. This decision was supported by the simplicity of the methods' implementation (we used the dwtleader Matlab function) associated with the potential of simultaneously characterising multifractal properties and exploring wavelet self-similarity structures 41 .
Even though uncertainty exists as to whether nonlinear features bring discriminatory power or not, there are some studies indicating that higher prediction performance can be obtained by combining linear and nonlinear features 77 . Moreover, a method that is able to capture the nonlinear nature of EEG signals may provide valuable insight into brain dynamics 31 .
The inspection of multivariate features in this study is supported by some studies reporting increased prediction performance when using bivariate features, in comparison with univariate linear features 24,78 . Although we are analysing multivariate features, these are global measures computed from graphs of bivariate measures. It is important to note that using bivariate measures as input to the feature reduction methods would result in a considerable increase in the feature reduction computational time (as a total of 7695 bivariate features would be analysed).
Ultimately, when studies conducted a feature selection step in the seizure prediction methodology, high inter-patient variability is often observed 28,78 . This further motivates the inclusion of different types of features in our study.

EEG feature data preparation
After feature extraction, we inspected the feature datasets obtained for each feature group (univariate linear, univariate nonlinear, and multivariate) and each seizure. Specifically, we searched for constant and quasi-constant features in the three feature groups. Constant (corresponding to features that have the same value for all 5-second windows) and quasiconstant features (corresponding to features for which more than half of the values are equal) were discarded from further analysis.
Given that we searched for constant and quasi-constant features for each seizure independently, the feature dataset after discarding constant and quasi-constant features could differ among seizures. That was only verified for the multivariate feature group. In fact, no constant features were found for univariate linear and univariate nonlinear features. Alpha peak frequency, decorrelation time and spectral edge frequency (at 50%) were the quasi-constant features removed over all channels and seizures from the univariate linear feature group. The remaining univariate linear features were selected (not discarded) over all channels and seizures. RQA feature L max was the only quasi-constant feature found across all channels for the univariate nonlinear feature group. The remaining univariate nonlinear features were selected over all channels and seizures.
For the multivariate feature group, we observed that the feature dataset (resulting after discarding constant and quasiconstant features) would differ from seizure to seizure. Based on that, we provide information regarding the constant, quasi-constant and features that showed across all seizures in Table S3. We counted a total of 15 (3.0%), 111 (22,4%), and 216 (43.6%) constant, quasi-constant, and selected (over all channels and seizures) multivariate features, respectively.

Unsupervised learning methods
The following methods were chosen to search for the preictal interval in the EEG-based feature dataset: 1. K-means clustering, a widely used clustering partitioning method, is better suited to detect well-separated and similarly sized and shaped clusters 79,80 . The initial cluster centroids were selected using the k-means++ method 81 and the distances between points and centroids were computed using the Euclidean distance. The algorithm was run for k clusters, with k = 2, 3, 4.
2. Agglomerative hierarchical clustering can identify structured clusters. We used the Ward linkage method and the Euclidean distance as the dissimilarity measure 82 . The algorithm was run for k clusters, with k = 2, 3, 4.
3. Hierarchical density-based spatial clustering of applications with noise (HDBSCAN) performs DBSCAN over varying values of ε, which is an input argument that defines the maximum distance that can exist between two points within the same cluster 83 . HDBSCAN automatically selects the optimal clustering solution, requiring the definition of the number of samples in a neighbourhood, MinPts, for a point to be considered a core point 84,85 . This parameter was set to six, corresponding to twice the dimensionality of the feature space 86 . An optional input parameter, minimum cluster size, MinSz, was set to 20 samples 18 . Density-based clustering algorithms can be used to identify arbitrarily shaped clusters 83, 85, 86 . 4. Expectation-maximisation clustering using Gaussian mixture models 87 , besides successfully identifying round clusters, is also used to find elongated clusters that follow Gaussian distributions. Mean and standard deviation are estimated using the expectation-maximisation algorithm. The algorithm was run for k mixture components (or underlying Gaussian distributions), with k = 2, 3, 4.

Results for unsupervised learning
This section presents the results obtained after performing feature dimensionality reduction and clustering tasks.

Results for clustering solution categorisation
Categorisation performed by each of the five members comprising our epilepsy research team is presented for multivariate (Table S4), univariate linear (Table S5), univariate nonlinear (Table S6), and control univariate linear reduced data (Table S7). Fig. S6 depicts the number of seizures for which less than 3, 3, 4, or 5 votes have been observed after the categorisation task. When less than 3 votes would be obtained for a given seizure, the expert team would gather and discuss over the category that should be assigned to that seizure. Importantly, the figure also shows the number of seizures that were assigned a given category with the vote of Expert 1, who performed the first visual inspection. Fig. S7 depicts the number of categories that resulted from this team discussion.     Feature group Frequency Fig. S7. Number of categories assigned after team member discussion over the categorisation for which less than three votes were obtained.

Results for control intervals
In this section we present the results for the analysis of control intervals. Figures  In (a) the 4.5-hour interval was analysed for all seizures while only two 4.5-hour control intervals were analysed. In (b) the 4.5-hour interval was analysed for six seizures while no 4.5-hour control intervals were analysed. In (c) the 4.5-hour interval was analysed for four seizures while only one 4.5-hour control interval was analysed. Table S8 shows the categories assigned to the two 4.5-hour intervals analysed for each seizure: the one preceding the onset and the control interval. This analysis was conducted for the univariate linear feature group. The seizures categorised as 3 or 6 in the 4.5-hour interval preceding the onset are indicated in bold.

Prevalence of clustering methods
Fig. S13 presents the prevalence of the clustering methods explored in this study. We assessed the frequency of each method for each group of features. The analysis was conducted considering all categories and only categories 3 and 6 (categories representing putative preictal activity). Fig. S14 presents the mean and standard deviation of the Dunn's index computed across seizures, for each feature group, category, and clustering method. We also present the values for all categories. Fig. S15 H  D  B  S  C  A  N  D  B  S  C  A  N  A  g  g  l  o  m  e  r  a  t  i  v  e  h  i  e  r  a  r  c  h  i  c  a  l  K  -m  e  a  n  s  G  M  M   M  u  l  t  i  v  a  r  i  a

Preictal comparison between EEG and ECG
Fig. S16 presents information regarding the existence of a putative preictal state in EEG (reported in the present study) and ECG (reported in 18 ) data. Additionally, the identified preictal intervals were also characterised in terms of starting time before seizure onset for each modality (see Fig. S17). The preictal intervals identified in the ECG data were selected according to the time continuity and duration (as reported in Leal et al. 18 ).
In the case of EEG, the preictal intervals represented in this section were found for either category 3 or 6. When preictal patterns were observed for more than one group of features a final interval was chosen according to the density, duration and starting time before seizure onset (as explained in section 2).  Fig. S17. Results for preictal interval characterisation in terms of starting time before seizure EEG onset, when preictal patterns were found in the analysed EEG lead seizures. Dots correspond to the values of preictal starting time. Solid and dashed lines indicate medians and means, respectively. Box's tops and bottoms indicate the 75th and 25th percentiles, respectively. Whiskers refer to the span of preictal starting time after discarding outliers. 6.6 State-of-the-art preictal comparison  Table S9 present the comparison between the preictal intervals obtained using unsupervised learning and the preictal intervals obtained using grid-search supervised learning. This comparison was made for two studies 5, 6 reporting preictal grid-search during seizure prediction model training using the EPILEPSIAE database. The authors provided identification numbers for each patient, allowing for a patient-wise comparison of preictal starting time. In Pinto et al. 2021 study 5 , the authors present the training results for different values of SOP in their supplementary material. The values we used to perform the study comparison correspond to the average preictal (SOP plus 10 minutes SPH) interval with the highest value of training fitness. Additionally, when the fitness values were equal for different average preictal intervals, we chose the average preictal interval that starts closer to the seizure onset, as it means that the patient has to wait less time for a seizure to occur.
The average preictal interval found using unsupervised learning was obtained for each patient by averaging over the starting time of the identified preictal intervals (which were often not found for all seizures of a patient). The criteria to get a final preictal interval when preictal intervals were found for more than one group of features was clarified in section 2.

Metadata analysis
The metadata analysis was performed for each feature group to infer about the influence of seizure-characterising variables on the preictal identification results. As shown in Table S2, vigilance state, onset hour, and ILAE seizure classification are categorical variables, whereas the noise variable is a continuous numerical variable. First, we computed the Kruskal-Wallis statistical test between the pairs of categorical and numerical variables indicated in Table S10. The returned p-values indicated that the Kruskal-Wallis test rejected the null hypothesis that the pairs of variables came from the same distribution at a 1% significance level.
The bias-corrected Cramér's V measure was computed to verify the association between each pair of categorical variables. Cramér's V values vary from 0 (corresponding to no association between the variables) to 1 (complete association). The Cramér's V measure corresponds to the absolute value of the phi coefficient when the two variables under study are binary variables. The results in Table S10 show that no pair of categorical variables verifies any association. A second analysis (see Table S11) was performed to evaluate the possible association between the continuous preictal characteristics (duration, density, and starting time) and the categorical (vigilance state, ILAE classification, and onset hour) and continuous (percentage of noise) metadata variables.
The results for the Kruskal-Wallis statistical test between the pairs of categorical and continuous variables indicate that the null hypothesis that the pairs of variables came from the same distribution was rejected for all but one pair of variables, at a 5% significance level. Even though the null hypothesis has not been rejected when Kruskal-Wallis statistical test was applied to the pair onset hour and multivariate preictal duration, it is important to note that this group contains 33 samples (preictal was found for the data of 33 seizures) and 16 categories (16 discrete hours from the 24 hour discretisation period where seizure onset occurred).
Finally, we computed the Pearson's correlation coefficient between the continuous variables: the percentage of noise and each of the preictal characteristics (starting time, duration, and density) found for the seizures assigned categories 3 and 6 (see Table S11). The results indicate that no correlation was found between the obtained preictal characteristics and the percentage of noise in the EEG signals.